Most of what we need for a Bayesian regression is the same as for a Frequentist regression, except priors
❓How can we check whether our priors are appropriate?
🤔 How do we use distributions of parameters (prior or posterior) to make predictions?
Two sources of variability:
Distributional (Prior or Posterior) Variability
Sampling (Data) Variability
How certain are we about the value of the parameters?
How much will observed data typically vary from its expected value? a.k.a. inherent variability in the data around the true value.
\[ p(\tilde{x} \mid x) \]
Probability of new sample \(\tilde{x}\) given the current data \(x\).
draw \(\theta_i \sim \pi(\theta)\) (one sample of all the model parameters)
draw \(y_i \sim p(y_i \mid \theta_i)\) (sample of data from model implied by \(\theta_i\))
1 accounts for uncertainty about the parameters \(\theta\), 2 accounts for uncertainty in the sample
Two sources of variability:
Distributional (Prior or Posterior) Variability
Sampling (Data) Variability
Prior predictive checks allow us to simulate samples from our prior and make sure it lines up with our expectations about reality.
Let’s adjust our priors to be more realistic.
Posterior predictive checks do something similar, but simulates new data based on the posterior. This allows us to “look for systematic discrepancies between real and simulated data” (Gelman et al. 2004) which tells us a little about model fit.
Remember, p-values tell you:
given a distribution \(\pi(\theta)\), what is the probability of observing a \(\theta\) as or more extreme as the one we did? In other words, how consistent is \(\theta\) with \(\pi(\theta)\)?
# generate samples from posterior
yrep <- posterior_predict(m1, draws = 500)
# overall mean function
overall_mu <- function(x) mean(x)
# calc for actual data
overall_mu(df$y) [1] 0.236782
mtcars DataRows: 32
Columns: 11
$ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
$ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
$ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
$ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
$ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
$ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…
Family: gaussian
Links: mu = identity; sigma = identity
Formula: mpg ~ hp + gear
Data: mtcars (Number of observations: 32)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 18.02 3.32 11.49 24.57 1.00 4135 3048
hp -0.06 0.01 -0.08 -0.05 1.00 4546 2822
gear 3.11 0.77 1.55 4.64 1.00 4361 2848
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 3.22 0.44 2.48 4.21 1.00 3854 3258
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Family: gaussian
Links: mu = identity; sigma = identity
Formula: mpg ~ hp + gear
Data: mtcars (Number of observations: 32)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 18.02 3.32 11.49 24.57 1.00 4135 3048
hp -0.06 0.01 -0.08 -0.05 1.00 4546 2822
gear 3.11 0.77 1.55 4.64 1.00 4361 2848
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 3.22 0.44 2.48 4.21 1.00 3854 3258
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
(more on ROPE with a brms model here)
# s/o to Chat GPT for helping simulat the data
# libs
library(brms)
library(bayesplot)
library(tidybayes)
# sim data
set.seed(540)
n <- 200
parental_income <- rnorm(n, mean = 50, sd = 10) # income_in_k
z <- 1 + 0.05 * parental_income + rnorm(n, 0, 1)
p <- 1 / (1 + exp(-z))
passed_exam <- rbinom(n, 1, p)
df <- data.frame(parental_income, passed_exam)
# priors
priors <- c(
prior(normal(0, 5), class = "b"),
prior(normal(0, 5), class = "Intercept")
)
# fit
fit <- brm(passed_exam ~ parental_income, data = df,
family = bernoulli(), prior = priors,
seed = 123, chains = 4, cores = 4, iter = 2000)Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -std=gnu2x -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/opt/R/arm64/include -fPIC -falign-functions=64 -Wall -g -O2 -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
Family: bernoulli
Links: mu = logit
Formula: passed_exam ~ parental_income
Data: df (Number of observations: 200)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.28 1.69 -2.00 4.56 1.00 2279 2473
parental_income 0.04 0.04 -0.03 0.11 1.00 2082 2141
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
0%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.014 seconds (Warm-up)
Chain 1: 0.016 seconds (Sampling)
Chain 1: 0.03 seconds (Total)
Chain 1:
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.015 seconds (Warm-up)
Chain 2: 0.012 seconds (Sampling)
Chain 2: 0.027 seconds (Total)
Chain 2:
Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.015 seconds (Warm-up)
Chain 3: 0.013 seconds (Sampling)
Chain 3: 0.028 seconds (Total)
Chain 3:
Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.015 seconds (Warm-up)
Chain 4: 0.012 seconds (Sampling)
Chain 4: 0.027 seconds (Total)
Chain 4:
# s/o to Chat GPT for helping simulate the data
# Load required libraries
library(brms)
library(bayesplot)
library(tidybayes)
# sim
set.seed(123)
n_schools <- 10
n_students <- 20
total_n <- n_schools * n_students
school <- factor(rep(1:n_schools, each = n_students))
parental_income <- rnorm(total_n, mean = 50, sd = 10) # Parental income in thousands of dollars
school_effect <- rnorm(n_schools, 0, 5)[school]
individual_error <- rnorm(total_n, 0, 5)
exam_score <- 50 + 0.5 * parental_income + school_effect + individual_error
df <- data.frame(school, parental_income, exam_score)
# priors
priors <- c(
prior(normal(0, 10), class = "b"),
prior(normal(50, 10), class = "Intercept"),
prior(exponential(1), class = "sd")
)
# fit
fit <- brm(exam_score ~ parental_income + (1 + parental_income | school),
data = df, family = gaussian(), prior = priors,
seed = 123, chains = 4, cores = 4, iter = 2000)Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -std=gnu2x -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/opt/R/arm64/include -fPIC -falign-functions=64 -Wall -g -O2 -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
Family: gaussian
Links: mu = identity; sigma = identity
Formula: exam_score ~ parental_income + (1 + parental_income | school)
Data: df (Number of observations: 200)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~school (Number of levels: 10)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept) 1.07 1.01 0.02 3.82 1.00
sd(parental_income) 0.11 0.04 0.06 0.20 1.01
cor(Intercept,parental_income) 0.05 0.56 -0.93 0.95 1.01
Bulk_ESS Tail_ESS
sd(Intercept) 1598 1122
sd(parental_income) 659 1090
cor(Intercept,parental_income) 420 847
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 51.81 1.94 48.04 55.71 1.00 5579 2443
parental_income 0.48 0.05 0.37 0.57 1.00 1501 1300
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 4.93 0.25 4.46 5.46 1.00 4824 2748
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
1200 / 2000 [ 60%] (Sampling)
Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.669 seconds (Warm-up)
Chain 4: 0.508 seconds (Sampling)
Chain 4: 1.177 seconds (Total)
Chain 4:
Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.725 seconds (Warm-up)
Chain 1: 0.555 seconds (Sampling)
Chain 1: 1.28 seconds (Total)
Chain 1:
Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.834 seconds (Warm-up)
Chain 3: 0.474 seconds (Sampling)
Chain 3: 1.308 seconds (Total)
Chain 3:
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.849 seconds (Warm-up)
Chain 2: 0.515 seconds (Sampling)
Chain 2: 1.364 seconds (Total)
Chain 2:
Now that we’ve seen how to fit Bayesian regression models, Frequentist models will be a walk in the park! We don’t have to think about priors, or prior predictive checks…!
But we do need to think about how to read the results.
t-test
z-test
chi-square test
z-statistic:
\[ z = \frac{\bar{x} - \mu_0}{se} \]
where \(\bar{x}\) is the observed sample statistic, \(\mu_0\) is the sample statistic expected under \(H_0\), and \(se\) is the sample standard error.
z-statistics measure how far away the sample statistic is from the expected \(H_0\) statistic. It’s a z-score using the null distribution \(\mathcal{N}(\mu_0, se)\)
❓when sample statistics are compatible with \(H_0\), what type of z-statistics would you expect to see?
Because we divide by \(s\), z-scores are standardized, meaning that we can use the same standard cutoffs to create Neyman-Pearson rejection regions! z-statistics that are more extreme than \(z = \pm 1.645\) account for the 10% most extreme values. For the 5% most extreme we use \(z = \pm 1.96\).
In a Fisherian or NHST framework, the z-distribution can also help us calculate p-values. Here \(p = 0.7\).
Common z-tests:
z-tests for means (known \(\sigma\))
z-tests for proportions (believe it or not, \(\sigma\) also known)
t-statistic:
\[ t = \frac{\bar{x} - \mu_0}{se} \]
where \(\bar{x}\) is the observed sample statistic, \(\mu_0\) is the sample statistic expected under \(H_0\), and \(se\) is the standard error.
t-statistics also measure how far away the sample statistic is from the expected \(H_0\) statistic.
❓when sample statistics are compatible with \(H_0\), what type of t-statistics would you expect to see?
Because we divide by \(se\), t-scores are standardized, meaning that we can use the same standard cutoffs (depending on df) to create Neyman-Pearson rejection regions!
In a Fisherian or NHST framework, the z-distribution can also help us calculate p-values.
Common t-tests:
t-tests for mean (unknown \(\sigma\))
t-test for regression coefficients
t-test for difference between means
# is the mean weight of bumblebees in florida different than the mean bumblebee weight of 175mg?
t.test(data,
alternative = "two.sided",
mu = 175)
One Sample t-test
data: data
t = -0.3004, df = 99, p-value = 0.7645
alternative hypothesis: true mean is not equal to 175
95 percent confidence interval:
172.7395 176.6660
sample estimates:
mean of x
174.7028
library(TOSTER)
tsum_TOST(m1 = mean(data),
mu = 175,
sd1 = sd(data),
n1 = length(data),
low_eqbound = 170,
high_eqbound = 180)
One-sample t-test
The equivalence test was significant, t(99) = 4.753, p = 3.4e-06
The null hypothesis test was non-significant, t(99) = -0.300, p = 7.65e-01
NHST: don't reject null significance hypothesis that the effect is equal to 175
TOST: reject null equivalence hypothesis
TOST Results
t df p.value
t-test -0.3004 99 0.765
TOST Lower 4.7530 99 < 0.001
TOST Upper -5.3538 99 < 0.001
Effect Sizes
Estimate SE C.I. Conf. Level
Raw -0.2972 0.9894 [173.0599, 176.3456] 0.9
Hedges's g 17.5226 1.2431 [15.4236, 19.5332] 0.9
Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").
# is the difference in mean weights of bumblebees in florida and california different than 0?
t.test(x = data1,
y = data2,
alternative = "two.sided")
Welch Two Sample t-test
data: data1 and data2
t = -4.622, df = 195.18, p-value = 6.898e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-8.118912 -3.262520
sample estimates:
mean of x mean of y
174.5142 180.2050
city temperature rainfall bee_weight
1 City 1 24.64579 933.8643 9.294917
2 City 1 22.91725 1049.5396 3.650119
# fit model
m1 <- lm(bee_weight ~ rainfall + temperature,
data = simulated_data)
# regression table
summary(m1)
Call:
lm(formula = bee_weight ~ rainfall + temperature, data = simulated_data)
Residuals:
Min 1Q Median 3Q Max
-7.0941 -1.3528 -0.0109 1.3792 8.6508
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.0597404 0.1251260 40.44 <2e-16 ***
rainfall -0.0050555 0.0001271 -39.78 <2e-16 ***
temperature 0.2991710 0.0045176 66.22 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.999 on 9997 degrees of freedom
Multiple R-squared: 0.3112, Adjusted R-squared: 0.311
F-statistic: 2258 on 2 and 9997 DF, p-value: < 2.2e-16
university age group gender caffeine_mg gestures
1 University 3 25.11077 Group B Woman 151.92202 43
2 University 3 23.72048 Group B Woman 74.42176 29
# does caffeine intake influence the number of gestures used in conversation?
library(lme4)
# z score
data <- data |> mutate(across(all_of(c("age", "caffeine_mg")),
~ (.-mean(.)) / sd(.)))
m2 <- glmer(gestures ~ age + gender + caffeine_mg + (1 | university),
data = data,
family = poisson)
m2 |>summary()Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: poisson ( log )
Formula: gestures ~ age + gender + caffeine_mg + (1 | university)
Data: data
AIC BIC logLik -2*log(L) df.resid
641.5 657.1 -314.7 629.5 94
Scaled residuals:
Min 1Q Median 3Q Max
-2.42780 -0.57690 0.04138 0.61653 2.07566
Random effects:
Groups Name Variance Std.Dev.
university (Intercept) 0.0001464 0.0121
Number of obs: 100, groups: university, 5
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.72267 0.03285 113.340 <2e-16 ***
age -0.02396 0.01694 -1.415 0.1572
genderNon-binary -0.10289 0.04393 -2.342 0.0192 *
genderWoman -0.09640 0.04279 -2.253 0.0243 *
caffeine_mg 0.04686 0.01654 2.834 0.0046 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) age gndrN- gndrWm
age -0.225
gndrNn-bnry -0.761 0.250
genderWoman -0.777 0.237 0.617
caffeine_mg 0.105 0.029 -0.146 -0.119
palmerpenguin data
Call:
lm(formula = body_mass_g ~ flipper_length_mm, data = penguins_clean)
Residuals:
Min 1Q Median 3Q Max
-1057.33 -259.79 -12.24 242.97 1293.89
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5872.09 310.29 -18.93 <2e-16 ***
flipper_length_mm 50.15 1.54 32.56 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 393.3 on 331 degrees of freedom
Multiple R-squared: 0.7621, Adjusted R-squared: 0.7614
F-statistic: 1060 on 1 and 331 DF, p-value: < 2.2e-16
Call:
glm(formula = sex ~ body_mass_g + flipper_length_mm + bill_length_mm,
family = binomial(link = "logit"), data = penguins_clean)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.2569324 2.7636399 2.988 0.00281 **
body_mass_g 0.0029604 0.0004188 7.069 1.56e-12 ***
flipper_length_mm -0.1348614 0.0234043 -5.762 8.30e-09 ***
bill_length_mm 0.1458945 0.0332279 4.391 1.13e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 461.61 on 332 degrees of freedom
Residual deviance: 350.73 on 329 degrees of freedom
AIC: 358.73
Number of Fisher Scoring iterations: 4
brmsRunning /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -std=gnu2x -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/opt/R/arm64/include -fPIC -falign-functions=64 -Wall -g -O2 -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 2.1e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.074 seconds (Warm-up)
Chain 1: 0.008 seconds (Sampling)
Chain 1: 0.082 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 1e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.068 seconds (Warm-up)
Chain 2: 0.007 seconds (Sampling)
Chain 2: 0.075 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 1e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.074 seconds (Warm-up)
Chain 3: 0.008 seconds (Sampling)
Chain 3: 0.082 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 1e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.074 seconds (Warm-up)
Chain 4: 0.007 seconds (Sampling)
Chain 4: 0.081 seconds (Total)
Chain 4:
Family: gaussian
Links: mu = identity; sigma = identity
Formula: body_mass_g ~ flipper_length_mm
Data: penguins_clean (Number of observations: 333)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -5876.93 313.54 -6487.91 -5265.71 1.00 3759 2684
flipper_length_mm 50.18 1.56 47.16 53.19 1.00 3800 2764
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 394.83 15.39 366.06 426.70 1.00 4058 3059
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
brmsRunning /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -std=gnu2x -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/opt/R/arm64/include -fPIC -falign-functions=64 -Wall -g -O2 -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 2.8e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.28 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.367 seconds (Warm-up)
Chain 1: 0.146 seconds (Sampling)
Chain 1: 0.513 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 3e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.367 seconds (Warm-up)
Chain 2: 0.172 seconds (Sampling)
Chain 2: 0.539 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 5e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.304 seconds (Warm-up)
Chain 3: 0.167 seconds (Sampling)
Chain 3: 0.471 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 4e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.368 seconds (Warm-up)
Chain 4: 0.167 seconds (Sampling)
Chain 4: 0.535 seconds (Total)
Chain 4:
Family: bernoulli
Links: mu = logit
Formula: sex ~ body_mass_g + flipper_length_mm + bill_length_mm
Data: penguins_clean (Number of observations: 333)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 8.34 2.73 3.11 13.77 1.00 2144 2165
body_mass_g 0.00 0.00 0.00 0.00 1.00 2102 2259
flipper_length_mm -0.14 0.02 -0.18 -0.09 1.00 1624 1864
bill_length_mm 0.15 0.03 0.08 0.21 1.00 1800 1655
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
brmsRunning /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -std=gnu2x -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/opt/R/arm64/include -fPIC -falign-functions=64 -Wall -g -O2 -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 5.8e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.58 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 1.893 seconds (Warm-up)
Chain 1: 0.707 seconds (Sampling)
Chain 1: 2.6 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 1.2e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 1.816 seconds (Warm-up)
Chain 2: 0.664 seconds (Sampling)
Chain 2: 2.48 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 1.1e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 2.199 seconds (Warm-up)
Chain 3: 0.629 seconds (Sampling)
Chain 3: 2.828 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 1.1e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 2.043 seconds (Warm-up)
Chain 4: 0.546 seconds (Sampling)
Chain 4: 2.589 seconds (Total)
Chain 4:
Family: gaussian
Links: mu = identity; sigma = identity
Formula: body_mass_g ~ flipper_length_mm + (1 | species) + (1 | island)
Data: penguins_clean (Number of observations: 333)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~island (Number of levels: 3)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 143.71 153.74 4.75 603.97 1.00 874 862
~species (Number of levels: 3)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 385.51 259.46 96.05 1103.21 1.00 1038 563
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -4199.97 674.25 -5501.69 -2829.28 1.00 2037 1795
flipper_length_mm 41.63 3.11 35.57 47.91 1.00 2587 2331
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 374.40 14.93 347.03 405.79 1.00 2924 2316
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Choose whether you want to answer this with a Bayesian or Frequentist framework.
Research Questions: What is the effect of sleep deprivation on Reaction time? Is the effect consistent for all people? or does it vary a lot person to person?
Reaction Days Subject
1 249.5600 0 308
2 258.7047 1 308
3 250.8006 2 308
4 321.4398 3 308
5 356.8519 4 308
6 414.6901 5 308
7 382.2038 6 308
8 290.1486 7 308
9 430.5853 8 308
10 466.3535 9 308
11 222.7339 0 309
12 205.2658 1 309
13 202.9778 2 309
14 204.7070 3 309
15 207.7161 4 309
16 215.9618 5 309
17 213.6303 6 309
18 217.7272 7 309
19 224.2957 8 309
20 237.3142 9 309
21 199.0539 0 310
22 194.3322 1 310
23 234.3200 2 310
24 232.8416 3 310
25 229.3074 4 310
26 220.4579 5 310
27 235.4208 6 310
28 255.7511 7 310
29 261.0125 8 310
30 247.5153 9 310
31 321.5426 0 330
32 300.4002 1 330
33 283.8565 2 330
34 285.1330 3 330
35 285.7973 4 330
36 297.5855 5 330
37 280.2396 6 330
38 318.2613 7 330
39 305.3495 8 330
40 354.0487 9 330
41 287.6079 0 331
42 285.0000 1 331
43 301.8206 2 331
44 320.1153 3 331
45 316.2773 4 331
46 293.3187 5 331
47 290.0750 6 331
48 334.8177 7 331
49 293.7469 8 331
50 371.5811 9 331
51 234.8606 0 332
52 242.8118 1 332
53 272.9613 2 332
54 309.7688 3 332
55 317.4629 4 332
56 309.9976 5 332
57 454.1619 6 332
58 346.8311 7 332
59 330.3003 8 332
60 253.8644 9 332
61 283.8424 0 333
62 289.5550 1 333
63 276.7693 2 333
64 299.8097 3 333
65 297.1710 4 333
66 338.1665 5 333
67 332.0265 6 333
68 348.8399 7 333
69 333.3600 8 333
70 362.0428 9 333
71 265.4731 0 334
72 276.2012 1 334
73 243.3647 2 334
74 254.6723 3 334
75 279.0244 4 334
76 284.1912 5 334
77 305.5248 6 334
78 331.5229 7 334
79 335.7469 8 334
80 377.2990 9 334
81 241.6083 0 335
82 273.9472 1 335
83 254.4907 2 335
84 270.8021 3 335
85 251.4519 4 335
86 254.6362 5 335
87 245.4523 6 335
88 235.3110 7 335
89 235.7541 8 335
90 237.2466 9 335
91 312.3666 0 337
92 313.8058 1 337
93 291.6112 2 337
94 346.1222 3 337
95 365.7324 4 337
96 391.8385 5 337
97 404.2601 6 337
98 416.6923 7 337
99 455.8643 8 337
100 458.9167 9 337
101 236.1032 0 349
102 230.3167 1 349
103 238.9256 2 349
104 254.9220 3 349
105 250.7103 4 349
106 269.7744 5 349
107 281.5648 6 349
108 308.1020 7 349
109 336.2806 8 349
110 351.6451 9 349
111 256.2968 0 350
112 243.4543 1 350
113 256.2046 2 350
114 255.5271 3 350
115 268.9165 4 350
116 329.7247 5 350
117 379.4445 6 350
118 362.9184 7 350
119 394.4872 8 350
120 389.0527 9 350
121 250.5265 0 351
122 300.0576 1 351
123 269.8939 2 351
124 280.5891 3 351
125 271.8274 4 351
126 304.6336 5 351
127 287.7466 6 351
128 266.5955 7 351
129 321.5418 8 351
130 347.5655 9 351
131 221.6771 0 352
132 298.1939 1 352
133 326.8785 2 352
134 346.8555 3 352
135 348.7402 4 352
136 352.8287 5 352
137 354.4266 6 352
138 360.4326 7 352
139 375.6406 8 352
140 388.5417 9 352
141 271.9235 0 369
142 268.4369 1 369
143 257.2424 2 369
144 277.6566 3 369
145 314.8222 4 369
146 317.2135 5 369
147 298.1353 6 369
148 348.1229 7 369
149 340.2800 8 369
150 366.5131 9 369
151 225.2640 0 370
152 234.5235 1 370
153 238.9008 2 370
154 240.4730 3 370
155 267.5373 4 370
156 344.1937 5 370
157 281.1481 6 370
158 347.5855 7 370
159 365.1630 8 370
160 372.2288 9 370
161 269.8804 0 371
162 272.4428 1 371
163 277.8989 2 371
164 281.7895 3 371
165 279.1705 4 371
166 284.5120 5 371
167 259.2658 6 371
168 304.6306 7 371
169 350.7807 8 371
170 369.4692 9 371
171 269.4117 0 372
172 273.4740 1 372
173 297.5968 2 372
174 310.6316 3 372
175 287.1726 4 372
176 329.6076 5 372
177 334.4818 6 372
178 343.2199 7 372
179 369.1417 8 372
180 364.1236 9 372
\(\chi^2\)-statistic:
\[ \chi^2 = \frac{(o-e)^2}{e}\\ \underbrace{\chi = \frac{(o-e)}{\sqrt{e}}}_\text{for demo purposes only} \]
where \(o\) is the observed sample statistic, \(e\) is the sample statistic expected under \(H_0\)
\(\chi^2\)-statistics also measure how far away the sample statistic is from the expected \(H_0\) statistic.
❓when sample statistics are compatible with \(H_0\), what type of \(\chi^2\)-statistics would you expect to see?
Common \(\chi^2\) tests:
Homogeneity: do two groups have the same distribution for one variable in the dataset?
Independence: are two variables in a dataset related?
Goodness of Fit: does one variable in the dataset match our expected distribution?
“I am satisfied with the quality of my healthcare”
| Strong Agree | Agree | Neither | Disagree | Strong Disagree | |
|---|---|---|---|---|---|
| Blue Shield | 15 | 10 | 40 | 20 | 15 |
Test of GOF: does the distribution of responses match our goal distribution of 20/25/30/15/10%?
| Strong Agree | Agree | Neither | Disagree | Strong Disagree | |
|---|---|---|---|---|---|
| Blue Shield | 15 | 10 | 40 | 20 | 15 |
| EXPECTED | Strong Agree | Agree | Neither | Disagree | Strong Disagree |
|---|---|---|---|---|---|
| Blue Shield | 100 * 0.2 | 100 * 0.25 | 100 * 0.3 | 100*0.15 | 100*0.1 |
\(\chi^2\)-statistic: \(\chi^2 = \frac{(o-e)^2}{e}\)
Chi-squared test for given probabilities
data: c(15, 10, 40, 20, 15)
X-squared = 17.75, df = 4, p-value = 0.001381
“I am satisfied with the quality of my healthcare”
| Strong Agree | Agree | Neither | Disagree | Strong Disagree | |
|---|---|---|---|---|---|
| Blue Shield | 15 | 10 | 40 | 20 | 15 |
| United Healthcare | 10 | 5 | 20 | 40 | 25 |
Test of Homogeneity: Is the distribution of responses different for the two groups?
Pearson's Chi-squared test
data: summary_data
X-squared = 18.5, df = 4, p-value = 0.0009851
# is the proportion of heads different than 0.5?? Do we have a fair coin?
prop.test(mean(dat), # prop heads
length(dat), # num flips
p = 0.5, alternative = "two.sided")
1-sample proportions test with continuity correction
data: mean(dat) out of length(dat), null probability 0.5
X-squared = 95.688, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
2.015243e-05 5.594179e-02
sample estimates:
p
0.0059
# Set seed for reproducibility
set.seed(42)
# Define the expected color distribution (e.g., based on a typical Skittles bag)
expected_proportions <- c(red = 0.2, orange = 0.2, yellow = 0.2, green = 0.2, purple = 0.2)
# Simulate the actual observed counts of Skittles in a bag (e.g., a bag of 100 Skittles)
total_skittles <- 100
observed_counts <- rmultinom(1, total_skittles, prob = expected_proportions)[,1]
# print
print("Observed counts:")[1] "Observed counts:"
red orange yellow green purple
26 24 15 20 15
# test
expected_counts <- total_skittles * expected_proportions
chisq_test <- chisq.test(x = observed_counts, p = expected_proportions)
print("Chi-square GOF test result:")[1] "Chi-square GOF test result:"
Chi-squared test for given probabilities
data: observed_counts
X-squared = 5.1, df = 4, p-value = 0.2772
# Set seed for reproducibility
set.seed(123)
# Define the number of students in each group
n_eecs <- 100
n_cads <- 100
# Define the expected distributions of undergraduate majors
prob_eecs <- c(math = 0.3, business = 0.1, humanities = 0.05,
computer_science = 0.4, data_science = 0.15)
prob_cads <- c(math = 0.2, business = 0.25, humanities = 0.1,
computer_science = 0.25, data_science = 0.2)
# Simulate data using multinomial distribution
observed_eecs <- rmultinom(1, n_eecs, prob = prob_eecs)[,1]
observed_cads <- rmultinom(1, n_cads, prob = prob_cads)[,1]
# Combine data into a table for the chi-square test
observed_data <- rbind(EECS = observed_eecs, CADS = observed_cads)
colnames(observed_data) <- c("Math", "Business", "Humanities",
"Computer Science", "Data Science")
# print
print("Observed data for EECS and CADS students:")[1] "Observed data for EECS and CADS students:"
Math Business Humanities Computer Science Data Science
EECS 30 9 8 33 20
CADS 13 27 15 25 20
[1] "Chi-square test of homogeneity result:"
Pearson's Chi-squared test
data: observed_data
X-squared = 18.955, df = 4, p-value = 0.0008022